On the Relevance of Sophisticated Structural Annotations for Disulfide Connectivity Pattern Prediction
نویسندگان
چکیده
Disulfide bridges strongly constrain the native structure of many proteins and predicting their formation is therefore a key sub-problem of protein structure and function inference. Most recently proposed approaches for this prediction problem adopt the following pipeline: first they enrich the primary sequence with structural annotations, second they apply a binary classifier to each candidate pair of cysteines to predict disulfide bonding probabilities and finally, they use a maximum weight graph matching algorithm to derive the predicted disulfide connectivity pattern of a protein. In this paper, we adopt this three step pipeline and propose an extensive study of the relevance of various structural annotations and feature encodings. In particular, we consider five kinds of structural annotations, among which three are novel in the context of disulfide bridge prediction. So as to be usable by machine learning algorithms, these annotations must be encoded into features. For this purpose, we propose four different feature encodings based on local windows and on different kinds of histograms. The combination of structural annotations with these possible encodings leads to a large number of possible feature functions. In order to identify a minimal subset of relevant feature functions among those, we propose an efficient and interpretable feature function selection scheme, designed so as to avoid any form of overfitting. We apply this scheme on top of three supervised learning algorithms: k-nearest neighbors, support vector machines and extremely randomized trees. Our results indicate that the use of only the PSSM (position-specific scoring matrix) together with the CSP (cysteine separation profile) are sufficient to construct a high performance disulfide pattern predictor and that extremely randomized trees reach a disulfide pattern prediction accuracy of [Formula: see text] on the benchmark dataset SPX[Formula: see text], which corresponds to [Formula: see text] improvement over the state of the art. A web-application is available at http://m24.giga.ulg.ac.be:81/x3CysBridges.
منابع مشابه
Disulfide Bonding Pattern Prediction Using Support Vector Machine with Parameters Tuned by Multiple Trajectory Search
The prediction of the location of disulfide bridges helps towards the solution of protein folding problem. Most of previous works on disulfide connectivity pattern prediction use the prior knowledge of the bonding state of cysteines. In this study an effective method is proposed to predict disulfide connectivity pattern without the prior knowledge of cysteins’bonding state. In previous research...
متن کاملPrediction of Disulfide Bonding Pattern Based on Support Vector Machine with Parameters Tuned by Multiple Trajectory Search
The prediction of the location of disulfide bridges helps solving the protein folding problem. Most of previous works on disulfide connectivity pattern prediction use the prior knowledge of the bonding state of cysteines. In this study an effective method is proposed to predict disulfide connectivity pattern without the prior knowledge of cysteins’bonding state. To the best of our knowledge, wi...
متن کاملDBCP: a web server for disulfide bonding connectivity pattern prediction without the prior knowledge of the bonding state of cysteines
The proper prediction of the location of disulfide bridges is efficient in helping to solve the protein folding problem. Most of the previous works on the prediction of disulfide connectivity pattern use the prior knowledge of the bonding state of cysteines. The DBCP web server provides prediction of disulfide bonding connectivity pattern without the prior knowledge of the bonding state of cyst...
متن کاملA novel database of disulfide patterns and its application to the discovery of distantly related homologs.
Disulfide bonds are conserved strongly among proteins of related structure and function. Despite the explosive growth of protein sequence databases and the vast numbers of sequence search tools, no tool exists to draw relations between the disulfide patterns of homologous proteins. We present a comprehensive database of disulfide bonding patterns and a search method to find proteins with simila...
متن کاملA New Method for Predicting Well Pattern Connectivity in a Continental Fluvial-delta Reservoir
The features of bad flow unit continuity and multiple layers emphesize the importance of a well pattern design for the development of a fluvial-delta reservoir. It is proposed a method to predict well pattern connectivity (WPC) based on the exploration and evaluation of wells. Moreover, the method helps evaluate the risk of well placement. This study initially establishes the parameters for cha...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره 8 شماره
صفحات -
تاریخ انتشار 2013